Overview

Dataset statistics

Number of variables40
Number of observations159
Missing cells440
Missing cells (%)6.9%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory232.1 KiB
Average record size in memory1.5 KiB

Variable types

Categorical27
DateTime3
Numeric8
Unsupported2

Warnings

TP_NOT has constant value "2" Constant
ID_AGRAVO has constant value "B54" Constant
NU_ANO has constant value "2016" Constant
SG_UF has constant value "33" Constant
ID_RG_RESI has constant value "" Constant
ID_PAIS has constant value "1" Constant
DEXAME has a high cardinality: 119 distinct values High cardinality
SG_UF_NOT is highly correlated with ID_MUNICIPHigh correlation
ID_MUNICIP is highly correlated with SG_UF_NOTHigh correlation
SEM_NOT is highly correlated with SEM_PRIHigh correlation
ID_MUNICIP is highly correlated with ID_MN_RESIHigh correlation
SEM_PRI is highly correlated with SEM_NOTHigh correlation
ID_MN_RESI is highly correlated with ID_MUNICIPHigh correlation
SEM_NOT is highly correlated with SEM_PRIHigh correlation
ID_MUNICIP is highly correlated with ID_MN_RESIHigh correlation
SEM_PRI is highly correlated with SEM_NOTHigh correlation
ID_MN_RESI is highly correlated with ID_MUNICIPHigh correlation
COUFINF is highly correlated with RESULT and 9 other fieldsHigh correlation
PMM is highly correlated with ID_OCUPA_N and 2 other fieldsHigh correlation
CS_RACA is highly correlated with ID_UNIDADE and 5 other fieldsHigh correlation
RESULT is highly correlated with COUFINF and 13 other fieldsHigh correlation
AT_SINTOMA is highly correlated with RESULT and 5 other fieldsHigh correlation
ID_UNIDADE is highly correlated with CS_RACA and 8 other fieldsHigh correlation
ID_REGIONA is highly correlated with ID_UNIDADE and 7 other fieldsHigh correlation
SG_UF_NOT is highly correlated with ID_UNIDADE and 7 other fieldsHigh correlation
SEM_NOT is highly correlated with CS_RACA and 1 other fieldsHigh correlation
DTRATA is highly correlated with COUFINF and 13 other fieldsHigh correlation
AT_LAMINA is highly correlated with RESULT and 5 other fieldsHigh correlation
CS_ESCOL_N is highly correlated with ID_UNIDADEHigh correlation
ID_OCUPA_N is highly correlated with COUFINF and 13 other fieldsHigh correlation
COMUNINF is highly correlated with COUFINF and 10 other fieldsHigh correlation
CLASSI_FIN is highly correlated with COUFINF and 9 other fieldsHigh correlation
LOC_INF is highly correlated with COUFINF and 15 other fieldsHigh correlation
COPAISINF is highly correlated with RESULT and 8 other fieldsHigh correlation
DSTRAESQUE is highly correlated with COUFINF and 11 other fieldsHigh correlation
TPAUTOCTO is highly correlated with COUFINF and 10 other fieldsHigh correlation
CS_GESTANT is highly correlated with CS_SEXOHigh correlation
TRA_ESQUEM is highly correlated with COUFINF and 8 other fieldsHigh correlation
AT_ATIVIDA is highly correlated with RESULT and 7 other fieldsHigh correlation
CS_SEXO is highly correlated with ID_OCUPA_N and 1 other fieldsHigh correlation
PCRUZ is highly correlated with COUFINF and 11 other fieldsHigh correlation
ID_MN_RESI is highly correlated with CS_RACA and 9 other fieldsHigh correlation
COUFINF is highly correlated with DTRATA and 9 other fieldsHigh correlation
ID_REGIONA is highly correlated with DTRATA and 8 other fieldsHigh correlation
DTRATA is highly correlated with COUFINF and 15 other fieldsHigh correlation
CS_ESCOL_N is highly correlated with ID_PAIS and 5 other fieldsHigh correlation
ID_OCUPA_N is highly correlated with ID_PAIS and 5 other fieldsHigh correlation
DSTRAESQUE is highly correlated with ID_REGIONA and 10 other fieldsHigh correlation
ID_PAIS is highly correlated with COUFINF and 24 other fieldsHigh correlation
NU_ANO is highly correlated with COUFINF and 24 other fieldsHigh correlation
CS_SEXO is highly correlated with ID_PAIS and 6 other fieldsHigh correlation
LOC_INF is highly correlated with COUFINF and 10 other fieldsHigh correlation
SG_UF is highly correlated with COUFINF and 24 other fieldsHigh correlation
CS_RACA is highly correlated with ID_PAIS and 5 other fieldsHigh correlation
RESULT is highly correlated with DTRATA and 11 other fieldsHigh correlation
AT_SINTOMA is highly correlated with ID_PAIS and 9 other fieldsHigh correlation
SG_UF_NOT is highly correlated with DTRATA and 8 other fieldsHigh correlation
TP_NOT is highly correlated with COUFINF and 24 other fieldsHigh correlation
AT_LAMINA is highly correlated with ID_PAIS and 9 other fieldsHigh correlation
COMUNINF is highly correlated with COUFINF and 10 other fieldsHigh correlation
TPAUTOCTO is highly correlated with COUFINF and 12 other fieldsHigh correlation
ID_AGRAVO is highly correlated with COUFINF and 24 other fieldsHigh correlation
CS_GESTANT is highly correlated with ID_PAIS and 6 other fieldsHigh correlation
ID_RG_RESI is highly correlated with COUFINF and 24 other fieldsHigh correlation
TRA_ESQUEM is highly correlated with DTRATA and 9 other fieldsHigh correlation
AT_ATIVIDA is highly correlated with ID_PAIS and 8 other fieldsHigh correlation
CLASSI_FIN is highly correlated with ID_PAIS and 12 other fieldsHigh correlation
PCRUZ is highly correlated with DTRATA and 9 other fieldsHigh correlation
DT_INVEST has 159 (100.0%) missing values Missing
PMM has 121 (76.1%) missing values Missing
DT_ENCERRA has 159 (100.0%) missing values Missing
DEXAME is uniformly distributed Uniform
DT_INVEST is an unsupported type, check if it needs cleaning or further analysis Unsupported
DT_ENCERRA is an unsupported type, check if it needs cleaning or further analysis Unsupported
COPAISINF has 115 (72.3%) zeros Zeros

Reproduction

Analysis started2021-07-06 18:52:47.627378
Analysis finished2021-07-06 18:53:07.912320
Duration20.28 seconds
Software versionpandas-profiling v3.0.0
Download configurationconfig.json

Variables

TP_NOT
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)0.6%
Missing0
Missing (%)0.0%
Memory size9.1 KiB
2
159 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters159
Distinct characters1
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2
2nd row2
3rd row2
4th row2
5th row2

Common Values

ValueCountFrequency (%)
2159
100.0%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
2159
100.0%

Most occurring characters

ValueCountFrequency (%)
2159
100.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number159
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2159
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common159
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
2159
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII159
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2159
100.0%

ID_AGRAVO
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)0.6%
Missing0
Missing (%)0.0%
Memory size11.9 KiB
B54
159 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters477
Distinct characters3
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowB54
2nd rowB54
3rd rowB54
4th rowB54
5th rowB54

Common Values

ValueCountFrequency (%)
B54159
100.0%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
b54159
100.0%

Most occurring characters

ValueCountFrequency (%)
B159
33.3%
5159
33.3%
4159
33.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number318
66.7%
Uppercase Letter159
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
5159
50.0%
4159
50.0%
Uppercase Letter
ValueCountFrequency (%)
B159
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common318
66.7%
Latin159
33.3%

Most frequent character per script

Common
ValueCountFrequency (%)
5159
50.0%
4159
50.0%
Latin
ValueCountFrequency (%)
B159
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII477
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
B159
33.3%
5159
33.3%
4159
33.3%
Distinct121
Distinct (%)76.1%
Missing0
Missing (%)0.0%
Memory size1.4 KiB
Minimum2016-01-04 00:00:00
Maximum2016-12-29 00:00:00
Histogram with fixed size bins (bins=50)

SEM_NOT
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct46
Distinct (%)28.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean201621.7799
Minimum201601
Maximum201652
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.4 KiB

Quantile statistics

Minimum201601
5-th percentile201603
Q1201607
median201618
Q3201635
95-th percentile201650
Maximum201652
Range51
Interquartile range (IQR)28

Descriptive statistics

Standard deviation15.86080584
Coefficient of variation (CV)7.866613346 × 10-5
Kurtosis-1.114233905
Mean201621.7799
Median Absolute Deviation (MAD)13
Skewness0.4739817247
Sum32057863
Variance251.565162
MonotonicityIncreasing
Histogram with fixed size bins (bins=46)
ValueCountFrequency (%)
20160514
 
8.8%
20161110
 
6.3%
20160410
 
6.3%
2016237
 
4.4%
2016147
 
4.4%
2016186
 
3.8%
2016076
 
3.8%
2016025
 
3.1%
2016285
 
3.1%
2016384
 
2.5%
Other values (36)85
53.5%
ValueCountFrequency (%)
2016011
 
0.6%
2016025
 
3.1%
2016034
 
2.5%
20160410
6.3%
20160514
8.8%
2016064
 
2.5%
2016076
3.8%
2016081
 
0.6%
2016094
 
2.5%
2016103
 
1.9%
ValueCountFrequency (%)
2016522
1.3%
2016514
2.5%
2016504
2.5%
2016493
1.9%
2016481
 
0.6%
2016473
1.9%
2016463
1.9%
2016453
1.9%
2016442
1.3%
2016431
 
0.6%

NU_ANO
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)0.6%
Missing0
Missing (%)0.0%
Memory size9.6 KiB
2016
159 

Length

Max length4
Median length4
Mean length4
Min length4

Characters and Unicode

Total characters636
Distinct characters4
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2016
2nd row2016
3rd row2016
4th row2016
5th row2016

Common Values

ValueCountFrequency (%)
2016159
100.0%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
2016159
100.0%

Most occurring characters

ValueCountFrequency (%)
2159
25.0%
0159
25.0%
1159
25.0%
6159
25.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number636
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2159
25.0%
0159
25.0%
1159
25.0%
6159
25.0%

Most occurring scripts

ValueCountFrequency (%)
Common636
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
2159
25.0%
0159
25.0%
1159
25.0%
6159
25.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII636
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2159
25.0%
0159
25.0%
1159
25.0%
6159
25.0%

SG_UF_NOT
Categorical

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct2
Distinct (%)1.3%
Missing0
Missing (%)0.0%
Memory size9.3 KiB
33
158 
31
 
1

Length

Max length2
Median length2
Mean length2
Min length2

Characters and Unicode

Total characters318
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)0.6%

Sample

1st row33
2nd row33
3rd row33
4th row33
5th row33

Common Values

ValueCountFrequency (%)
33158
99.4%
311
 
0.6%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
33158
99.4%
311
 
0.6%

Most occurring characters

ValueCountFrequency (%)
3317
99.7%
11
 
0.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number318
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
3317
99.7%
11
 
0.3%

Most occurring scripts

ValueCountFrequency (%)
Common318
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
3317
99.7%
11
 
0.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII318
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
3317
99.7%
11
 
0.3%

ID_MUNICIP
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct10
Distinct (%)6.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean330296.4654
Minimum310620
Maximum330630
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.4 KiB

Quantile statistics

Minimum310620
5-th percentile330240
Q1330455
median330455
Q3330455
95-th percentile330455
Maximum330630
Range20010
Interquartile range (IQR)0

Descriptive statistics

Standard deviation1572.635391
Coefficient of variation (CV)0.004761284348
Kurtosis158.0483816
Mean330296.4654
Median Absolute Deviation (MAD)0
Skewness-12.55361115
Sum52517138
Variance2473182.073
MonotonicityNot monotonic
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%)
330455129
81.1%
33024016
 
10.1%
3303406
 
3.8%
3303302
 
1.3%
3302001
 
0.6%
3304201
 
0.6%
3306301
 
0.6%
3106201
 
0.6%
3300231
 
0.6%
3300101
 
0.6%
ValueCountFrequency (%)
3106201
 
0.6%
3300101
 
0.6%
3300231
 
0.6%
3302001
 
0.6%
33024016
 
10.1%
3303302
 
1.3%
3303406
 
3.8%
3304201
 
0.6%
330455129
81.1%
3306301
 
0.6%
ValueCountFrequency (%)
3306301
 
0.6%
330455129
81.1%
3304201
 
0.6%
3303406
 
3.8%
3303302
 
1.3%
33024016
 
10.1%
3302001
 
0.6%
3300231
 
0.6%
3300101
 
0.6%
3106201
 
0.6%

ID_REGIONA
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct2
Distinct (%)1.3%
Missing0
Missing (%)0.0%
Memory size9.6 KiB
158 
1449
 
1

Length

Max length4
Median length0
Mean length0.0251572327
Min length0

Characters and Unicode

Total characters4
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)0.6%

Sample

1st row
2nd row
3rd row
4th row
5th row

Common Values

ValueCountFrequency (%)
158
99.4%
14491
 
0.6%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
14491
100.0%

Most occurring characters

ValueCountFrequency (%)
42
50.0%
11
25.0%
91
25.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number4
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
42
50.0%
11
25.0%
91
25.0%

Most occurring scripts

ValueCountFrequency (%)
Common4
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
42
50.0%
11
25.0%
91
25.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII4
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
42
50.0%
11
25.0%
91
25.0%

ID_UNIDADE
Real number (ℝ≥0)

HIGH CORRELATION

Distinct31
Distinct (%)19.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3091073.686
Minimum12580
Maximum7874847
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.4 KiB

Quantile statistics

Minimum12580
5-th percentile2270017.7
Q12288338
median2288338
Q33014463.5
95-th percentile6919626
Maximum7874847
Range7862267
Interquartile range (IQR)726125.5

Descriptive statistics

Standard deviation1719739.924
Coefficient of variation (CV)0.5563568192
Kurtosis1.483097919
Mean3091073.686
Median Absolute Deviation (MAD)0
Skewness1.534845034
Sum491480716
Variance2.957505405 × 1012
MonotonicityNot monotonic
Histogram with fixed size bins (bins=31)
ValueCountFrequency (%)
228833882
51.6%
546288611
 
6.9%
227653410
 
6.3%
69196268
 
5.0%
31878376
 
3.8%
30059926
 
3.8%
77404766
 
3.8%
22718774
 
2.5%
22699882
 
1.3%
60439412
 
1.3%
Other values (21)22
 
13.8%
ValueCountFrequency (%)
125801
 
0.6%
127341
 
0.6%
251431
 
0.6%
270491
 
0.6%
22693411
 
0.6%
22697831
 
0.6%
22699882
 
1.3%
22700211
 
0.6%
22718774
 
2.5%
227653410
6.3%
ValueCountFrequency (%)
78748471
 
0.6%
77404766
3.8%
69196268
5.0%
62007021
 
0.6%
60439412
 
1.3%
546288611
6.9%
38103481
 
0.6%
33754711
 
0.6%
31878376
3.8%
30420811
 
0.6%
Distinct113
Distinct (%)71.1%
Missing0
Missing (%)0.0%
Memory size1.4 KiB
Minimum2001-07-14 00:00:00
Maximum2016-12-27 00:00:00
Histogram with fixed size bins (bins=50)

SEM_PRI
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION

Distinct51
Distinct (%)32.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean201608.2138
Minimum200128
Maximum201652
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.4 KiB

Quantile statistics

Minimum200128
5-th percentile201601
Q1201604
median201614
Q3201631.5
95-th percentile201648.1
Maximum201652
Range1524
Interquartile range (IQR)27.5

Descriptive statistics

Standard deviation119.7641334
Coefficient of variation (CV)0.0005940439189
Kurtosis150.3465203
Mean201608.2138
Median Absolute Deviation (MAD)12
Skewness-12.09783496
Sum32055706
Variance14343.44766
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
20160215
 
9.4%
20162711
 
6.9%
2016038
 
5.0%
2016147
 
4.4%
2016047
 
4.4%
2016226
 
3.8%
2016466
 
3.8%
2016095
 
3.1%
2016055
 
3.1%
2016125
 
3.1%
Other values (41)84
52.8%
ValueCountFrequency (%)
2001281
 
0.6%
2015491
 
0.6%
2015501
 
0.6%
2015512
 
1.3%
2015521
 
0.6%
2016015
 
3.1%
20160215
9.4%
2016038
5.0%
2016047
4.4%
2016055
 
3.1%
ValueCountFrequency (%)
2016521
 
0.6%
2016513
1.9%
2016503
1.9%
2016491
 
0.6%
2016482
 
1.3%
2016466
3.8%
2016452
 
1.3%
2016442
 
1.3%
2016433
1.9%
2016421
 
0.6%
Distinct138
Distinct (%)87.3%
Missing1
Missing (%)0.6%
Memory size1.4 KiB
Minimum1927-03-31 00:00:00
Maximum2016-09-14 00:00:00
Histogram with fixed size bins (bins=50)

NU_IDADE_N
Real number (ℝ≥0)

Distinct60
Distinct (%)37.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4034.559748
Minimum3002
Maximum4088
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.4 KiB

Quantile statistics

Minimum3002
5-th percentile4015.8
Q14029
median4039
Q34052
95-th percentile4069.3
Maximum4088
Range1086
Interquartile range (IQR)23

Descriptive statistics

Standard deviation84.07121689
Coefficient of variation (CV)0.02083776722
Kurtosis146.5411281
Mean4034.559748
Median Absolute Deviation (MAD)12
Skewness-11.86230724
Sum641495
Variance7067.969509
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
404014
 
8.8%
40369
 
5.7%
40268
 
5.0%
40517
 
4.4%
40346
 
3.8%
40276
 
3.8%
40296
 
3.8%
40325
 
3.1%
40354
 
2.5%
40414
 
2.5%
Other values (50)90
56.6%
ValueCountFrequency (%)
30021
 
0.6%
40011
 
0.6%
40051
 
0.6%
40093
1.9%
40131
 
0.6%
40141
 
0.6%
40161
 
0.6%
40182
1.3%
40193
1.9%
40201
 
0.6%
ValueCountFrequency (%)
40881
 
0.6%
40821
 
0.6%
40811
 
0.6%
40801
 
0.6%
40791
 
0.6%
40771
 
0.6%
40731
 
0.6%
40721
 
0.6%
40693
1.9%
40682
1.3%

CS_SEXO
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct2
Distinct (%)1.3%
Missing0
Missing (%)0.0%
Memory size10.4 KiB
M
102 
F
57 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters159
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowF
2nd rowF
3rd rowM
4th rowF
5th rowM

Common Values

ValueCountFrequency (%)
M102
64.2%
F57
35.8%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
m102
64.2%
f57
35.8%

Most occurring characters

ValueCountFrequency (%)
M102
64.2%
F57
35.8%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter159
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
M102
64.2%
F57
35.8%

Most occurring scripts

ValueCountFrequency (%)
Latin159
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
M102
64.2%
F57
35.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII159
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
M102
64.2%
F57
35.8%

CS_GESTANT
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct3
Distinct (%)1.9%
Missing0
Missing (%)0.0%
Memory size9.1 KiB
6
109 
5
37 
9
13 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters159
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row5
2nd row5
3rd row6
4th row6
5th row6

Common Values

ValueCountFrequency (%)
6109
68.6%
537
 
23.3%
913
 
8.2%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
6109
68.6%
537
 
23.3%
913
 
8.2%

Most occurring characters

ValueCountFrequency (%)
6109
68.6%
537
 
23.3%
913
 
8.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number159
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
6109
68.6%
537
 
23.3%
913
 
8.2%

Most occurring scripts

ValueCountFrequency (%)
Common159
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
6109
68.6%
537
 
23.3%
913
 
8.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII159
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
6109
68.6%
537
 
23.3%
913
 
8.2%

CS_RACA
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct5
Distinct (%)3.1%
Missing0
Missing (%)0.0%
Memory size10.4 KiB
1
63 
9
63 
2
17 
4
14 
 
2

Length

Max length1
Median length1
Mean length0.9874213836
Min length0

Characters and Unicode

Total characters157
Distinct characters4
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row9
2nd row2
3rd row1
4th row2
5th row9

Common Values

ValueCountFrequency (%)
163
39.6%
963
39.6%
217
 
10.7%
414
 
8.8%
2
 
1.3%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
163
40.1%
963
40.1%
217
 
10.8%
414
 
8.9%

Most occurring characters

ValueCountFrequency (%)
963
40.1%
163
40.1%
217
 
10.8%
414
 
8.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number157
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
963
40.1%
163
40.1%
217
 
10.8%
414
 
8.9%

Most occurring scripts

ValueCountFrequency (%)
Common157
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
963
40.1%
163
40.1%
217
 
10.8%
414
 
8.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII157
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
963
40.1%
163
40.1%
217
 
10.8%
414
 
8.9%

CS_ESCOL_N
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct11
Distinct (%)6.9%
Missing0
Missing (%)0.0%
Memory size9.3 KiB
08
78 
09
32 
03
12 
06
05
Other values (6)
20 

Length

Max length2
Median length2
Mean length1.937106918
Min length0

Characters and Unicode

Total characters308
Distinct characters10
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row09
2nd row09
3rd row07
4th row02
5th row09

Common Values

ValueCountFrequency (%)
0878
49.1%
0932
20.1%
0312
 
7.5%
069
 
5.7%
058
 
5.0%
5
 
3.1%
074
 
2.5%
044
 
2.5%
103
 
1.9%
022
 
1.3%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
0878
50.6%
0932
20.8%
0312
 
7.8%
069
 
5.8%
058
 
5.2%
074
 
2.6%
044
 
2.6%
103
 
1.9%
022
 
1.3%
012
 
1.3%

Most occurring characters

ValueCountFrequency (%)
0154
50.0%
878
25.3%
932
 
10.4%
312
 
3.9%
69
 
2.9%
58
 
2.6%
15
 
1.6%
74
 
1.3%
44
 
1.3%
22
 
0.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number308
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0154
50.0%
878
25.3%
932
 
10.4%
312
 
3.9%
69
 
2.9%
58
 
2.6%
15
 
1.6%
74
 
1.3%
44
 
1.3%
22
 
0.6%

Most occurring scripts

ValueCountFrequency (%)
Common308
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0154
50.0%
878
25.3%
932
 
10.4%
312
 
3.9%
69
 
2.9%
58
 
2.6%
15
 
1.6%
74
 
1.3%
44
 
1.3%
22
 
0.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII308
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0154
50.0%
878
25.3%
932
 
10.4%
312
 
3.9%
69
 
2.9%
58
 
2.6%
15
 
1.6%
74
 
1.3%
44
 
1.3%
22
 
0.6%

SG_UF
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)0.6%
Missing0
Missing (%)0.0%
Memory size9.3 KiB
33
159 

Length

Max length2
Median length2
Mean length2
Min length2

Characters and Unicode

Total characters318
Distinct characters1
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row33
2nd row33
3rd row33
4th row33
5th row33

Common Values

ValueCountFrequency (%)
33159
100.0%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
33159
100.0%

Most occurring characters

ValueCountFrequency (%)
3318
100.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number318
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
3318
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common318
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
3318
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII318
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
3318
100.0%

ID_MN_RESI
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct23
Distinct (%)14.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean330397.7107
Minimum330010
Maximum330630
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.4 KiB

Quantile statistics

Minimum330010
5-th percentile330166
Q1330350
median330455
Q3330455
95-th percentile330455
Maximum330630
Range620
Interquartile range (IQR)105

Descriptive statistics

Standard deviation115.7850117
Coefficient of variation (CV)0.0003504413256
Kurtosis1.899937145
Mean330397.7107
Median Absolute Deviation (MAD)0
Skewness-1.581084296
Sum52533236
Variance13406.16894
MonotonicityNot monotonic
Histogram with fixed size bins (bins=23)
ValueCountFrequency (%)
330455105
66.0%
33024015
 
9.4%
3303406
 
3.8%
3301704
 
2.5%
3305103
 
1.9%
3303503
 
1.9%
3303303
 
1.9%
3300702
 
1.3%
3304522
 
1.3%
3300102
 
1.3%
Other values (13)14
 
8.8%
ValueCountFrequency (%)
3300102
 
1.3%
3300231
 
0.6%
3300702
 
1.3%
3301101
 
0.6%
3301302
 
1.3%
3301704
 
2.5%
3301901
 
0.6%
3302001
 
0.6%
33024015
9.4%
3302451
 
0.6%
ValueCountFrequency (%)
3306301
 
0.6%
3305801
 
0.6%
3305551
 
0.6%
3305103
 
1.9%
3304901
 
0.6%
330455105
66.0%
3304522
 
1.3%
3304201
 
0.6%
3304141
 
0.6%
3303901
 
0.6%

ID_RG_RESI
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)0.6%
Missing0
Missing (%)0.0%
Memory size9.6 KiB
159 

Length

Max length0
Median length0
Mean length0
Min length0

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row
2nd row
3rd row
4th row
5th row

Common Values

ValueCountFrequency (%)
159
100.0%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
No values found.

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

ID_PAIS
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)0.6%
Missing0
Missing (%)0.0%
Memory size9.1 KiB
1
159 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters159
Distinct characters1
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
1159
100.0%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
1159
100.0%

Most occurring characters

ValueCountFrequency (%)
1159
100.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number159
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1159
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common159
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1159
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII159
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1159
100.0%

DT_INVEST
Unsupported

MISSING
REJECTED
UNSUPPORTED

Missing159
Missing (%)100.0%
Memory size1.4 KiB

ID_OCUPA_N
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct46
Distinct (%)28.9%
Missing0
Missing (%)0.0%
Memory size9.8 KiB
66 
263110
13 
999991
11 
241005
 
5
223115
 
5
Other values (41)
59 

Length

Max length6
Median length6
Mean length3.509433962
Min length0

Characters and Unicode

Total characters558
Distinct characters10
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique34 ?
Unique (%)21.4%

Sample

1st row999991
2nd row999992
3rd row263110
4th row999991
5th row

Common Values

ValueCountFrequency (%)
66
41.5%
26311013
 
8.2%
99999111
 
6.9%
2410055
 
3.1%
2231155
 
3.1%
9999925
 
3.1%
2142055
 
3.1%
9999935
 
3.1%
9144054
 
2.5%
2211052
 
1.3%
Other values (36)38
23.9%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
26311013
 
14.0%
99999111
 
11.8%
2142055
 
5.4%
9999935
 
5.4%
2231155
 
5.4%
2410055
 
5.4%
9999925
 
5.4%
9144054
 
4.3%
2211052
 
2.2%
2611252
 
2.2%
Other values (35)36
38.7%

Most occurring characters

ValueCountFrequency (%)
1119
21.3%
9109
19.5%
292
16.5%
068
12.2%
561
10.9%
339
 
7.0%
435
 
6.3%
623
 
4.1%
79
 
1.6%
83
 
0.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number558
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1119
21.3%
9109
19.5%
292
16.5%
068
12.2%
561
10.9%
339
 
7.0%
435
 
6.3%
623
 
4.1%
79
 
1.6%
83
 
0.5%

Most occurring scripts

ValueCountFrequency (%)
Common558
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1119
21.3%
9109
19.5%
292
16.5%
068
12.2%
561
10.9%
339
 
7.0%
435
 
6.3%
623
 
4.1%
79
 
1.6%
83
 
0.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII558
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1119
21.3%
9109
19.5%
292
16.5%
068
12.2%
561
10.9%
339
 
7.0%
435
 
6.3%
623
 
4.1%
79
 
1.6%
83
 
0.5%

CLASSI_FIN
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct4
Distinct (%)2.5%
Missing0
Missing (%)0.0%
Memory size10.4 KiB
2
113 
1
44 
8
 
1
 
1

Length

Max length1
Median length1
Mean length0.9937106918
Min length0

Characters and Unicode

Total characters158
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2 ?
Unique (%)1.3%

Sample

1st row1
2nd row2
3rd row2
4th row2
5th row2

Common Values

ValueCountFrequency (%)
2113
71.1%
144
 
27.7%
81
 
0.6%
1
 
0.6%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
2113
71.5%
144
 
27.8%
81
 
0.6%

Most occurring characters

ValueCountFrequency (%)
2113
71.5%
144
 
27.8%
81
 
0.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number158
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2113
71.5%
144
 
27.8%
81
 
0.6%

Most occurring scripts

ValueCountFrequency (%)
Common158
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
2113
71.5%
144
 
27.8%
81
 
0.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII158
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2113
71.5%
144
 
27.8%
81
 
0.6%

AT_ATIVIDA
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct12
Distinct (%)7.5%
Missing0
Missing (%)0.0%
Memory size9.6 KiB
10
55 
11
54 
4
25 
3
99
Other values (7)
12 

Length

Max length2
Median length2
Mean length1.72327044
Min length0

Characters and Unicode

Total characters274
Distinct characters8
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique5 ?
Unique (%)3.1%

Sample

1st row4
2nd row10
3rd row11
4th row11
5th row99

Common Values

ValueCountFrequency (%)
1055
34.6%
1154
34.0%
425
15.7%
37
 
4.4%
996
 
3.8%
95
 
3.1%
22
 
1.3%
11
 
0.6%
81
 
0.6%
51
 
0.6%
Other values (2)2
 
1.3%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
1055
34.8%
1154
34.2%
425
15.8%
37
 
4.4%
996
 
3.8%
95
 
3.2%
22
 
1.3%
11
 
0.6%
81
 
0.6%
51
 
0.6%

Most occurring characters

ValueCountFrequency (%)
1165
60.2%
055
 
20.1%
425
 
9.1%
917
 
6.2%
37
 
2.6%
23
 
1.1%
81
 
0.4%
51
 
0.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number274
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1165
60.2%
055
 
20.1%
425
 
9.1%
917
 
6.2%
37
 
2.6%
23
 
1.1%
81
 
0.4%
51
 
0.4%

Most occurring scripts

ValueCountFrequency (%)
Common274
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1165
60.2%
055
 
20.1%
425
 
9.1%
917
 
6.2%
37
 
2.6%
23
 
1.1%
81
 
0.4%
51
 
0.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII274
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1165
60.2%
055
 
20.1%
425
 
9.1%
917
 
6.2%
37
 
2.6%
23
 
1.1%
81
 
0.4%
51
 
0.4%

AT_LAMINA
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct4
Distinct (%)2.5%
Missing0
Missing (%)0.0%
Memory size10.4 KiB
1
130 
3
15 
2
 
13
 
1

Length

Max length1
Median length1
Mean length0.9937106918
Min length0

Characters and Unicode

Total characters158
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)0.6%

Sample

1st row1
2nd row1
3rd row1
4th row1
5th row2

Common Values

ValueCountFrequency (%)
1130
81.8%
315
 
9.4%
213
 
8.2%
1
 
0.6%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
1130
82.3%
315
 
9.5%
213
 
8.2%

Most occurring characters

ValueCountFrequency (%)
1130
82.3%
315
 
9.5%
213
 
8.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number158
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1130
82.3%
315
 
9.5%
213
 
8.2%

Most occurring scripts

ValueCountFrequency (%)
Common158
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1130
82.3%
315
 
9.5%
213
 
8.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII158
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1130
82.3%
315
 
9.5%
213
 
8.2%

AT_SINTOMA
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct3
Distinct (%)1.9%
Missing0
Missing (%)0.0%
Memory size10.4 KiB
1
132 
2
26 
 
1

Length

Max length1
Median length1
Mean length0.9937106918
Min length0

Characters and Unicode

Total characters158
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)0.6%

Sample

1st row1
2nd row1
3rd row1
4th row2
5th row2

Common Values

ValueCountFrequency (%)
1132
83.0%
226
 
16.4%
1
 
0.6%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
1132
83.5%
226
 
16.5%

Most occurring characters

ValueCountFrequency (%)
1132
83.5%
226
 
16.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number158
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1132
83.5%
226
 
16.5%

Most occurring scripts

ValueCountFrequency (%)
Common158
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1132
83.5%
226
 
16.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII158
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1132
83.5%
226
 
16.5%

TPAUTOCTO
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct3
Distinct (%)1.9%
Missing0
Missing (%)0.0%
Memory size9.8 KiB
115 
2
38 
1
 
6

Length

Max length1
Median length0
Mean length0.2767295597
Min length0

Characters and Unicode

Total characters44
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2
2nd row
3rd row
4th row
5th row

Common Values

ValueCountFrequency (%)
115
72.3%
238
 
23.9%
16
 
3.8%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
238
86.4%
16
 
13.6%

Most occurring characters

ValueCountFrequency (%)
238
86.4%
16
 
13.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number44
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
238
86.4%
16
 
13.6%

Most occurring scripts

ValueCountFrequency (%)
Common44
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
238
86.4%
16
 
13.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII44
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
238
86.4%
16
 
13.6%

COUFINF
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct6
Distinct (%)3.8%
Missing0
Missing (%)0.0%
Memory size9.9 KiB
129 
RJ
16 
AM
 
11
RO
 
1
SP
 
1

Length

Max length2
Median length0
Mean length0.3773584906
Min length0

Characters and Unicode

Total characters60
Distinct characters8
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3 ?
Unique (%)1.9%

Sample

1st rowAM
2nd row
3rd row
4th row
5th row

Common Values

ValueCountFrequency (%)
129
81.1%
RJ16
 
10.1%
AM11
 
6.9%
RO1
 
0.6%
SP1
 
0.6%
AC1
 
0.6%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
rj16
53.3%
am11
36.7%
ro1
 
3.3%
ac1
 
3.3%
sp1
 
3.3%

Most occurring characters

ValueCountFrequency (%)
R17
28.3%
J16
26.7%
A12
20.0%
M11
18.3%
S1
 
1.7%
P1
 
1.7%
O1
 
1.7%
C1
 
1.7%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter60
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
R17
28.3%
J16
26.7%
A12
20.0%
M11
18.3%
S1
 
1.7%
P1
 
1.7%
O1
 
1.7%
C1
 
1.7%

Most occurring scripts

ValueCountFrequency (%)
Latin60
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
R17
28.3%
J16
26.7%
A12
20.0%
M11
18.3%
S1
 
1.7%
P1
 
1.7%
O1
 
1.7%
C1
 
1.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII60
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
R17
28.3%
J16
26.7%
A12
20.0%
M11
18.3%
S1
 
1.7%
P1
 
1.7%
O1
 
1.7%
C1
 
1.7%

COPAISINF
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct10
Distinct (%)6.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean7.786163522
Minimum0
Maximum199
Zeros115
Zeros (%)72.3%
Negative0
Negative (%)0.0%
Memory size1.4 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q31
95-th percentile31
Maximum199
Range199
Interquartile range (IQR)1

Descriptive statistics

Standard deviation32.27264484
Coefficient of variation (CV)4.144871187
Kurtosis22.84517243
Mean7.786163522
Median Absolute Deviation (MAD)0
Skewness4.818630712
Sum1238
Variance1041.523605
MonotonicityNot monotonic
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%)
0115
72.3%
130
 
18.9%
316
 
3.8%
1992
 
1.3%
1541
 
0.6%
1531
 
0.6%
1501
 
0.6%
1381
 
0.6%
221
 
0.6%
71
 
0.6%
ValueCountFrequency (%)
0115
72.3%
130
 
18.9%
71
 
0.6%
221
 
0.6%
316
 
3.8%
1381
 
0.6%
1501
 
0.6%
1531
 
0.6%
1541
 
0.6%
1992
 
1.3%
ValueCountFrequency (%)
1992
 
1.3%
1541
 
0.6%
1531
 
0.6%
1501
 
0.6%
1381
 
0.6%
316
 
3.8%
221
 
0.6%
71
 
0.6%
130
 
18.9%
0115
72.3%

COMUNINF
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct14
Distinct (%)8.8%
Missing0
Missing (%)0.0%
Memory size9.7 KiB
129 
330340
 
10
130380
 
5
130260
 
3
330580
 
2
Other values (9)
 
10

Length

Max length6
Median length0
Mean length1.132075472
Min length0

Characters and Unicode

Total characters180
Distinct characters10
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique8 ?
Unique (%)5.0%

Sample

1st row130260
2nd row
3rd row
4th row
5th row

Common Values

ValueCountFrequency (%)
129
81.1%
33034010
 
6.3%
1303805
 
3.1%
1302603
 
1.9%
3305802
 
1.3%
3302402
 
1.3%
1100201
 
0.6%
1300701
 
0.6%
3521201
 
0.6%
1302501
 
0.6%
Other values (4)4
 
2.5%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
33034010
33.3%
1303805
16.7%
1302603
 
10.0%
3305802
 
6.7%
3302402
 
6.7%
1100201
 
3.3%
1300701
 
3.3%
3521201
 
3.3%
1302501
 
3.3%
3302901
 
3.3%
Other values (3)3
 
10.0%

Most occurring characters

ValueCountFrequency (%)
062
34.4%
360
33.3%
115
 
8.3%
213
 
7.2%
413
 
7.2%
87
 
3.9%
54
 
2.2%
63
 
1.7%
92
 
1.1%
71
 
0.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number180
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
062
34.4%
360
33.3%
115
 
8.3%
213
 
7.2%
413
 
7.2%
87
 
3.9%
54
 
2.2%
63
 
1.7%
92
 
1.1%
71
 
0.6%

Most occurring scripts

ValueCountFrequency (%)
Common180
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
062
34.4%
360
33.3%
115
 
8.3%
213
 
7.2%
413
 
7.2%
87
 
3.9%
54
 
2.2%
63
 
1.7%
92
 
1.1%
71
 
0.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII180
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
062
34.4%
360
33.3%
115
 
8.3%
213
 
7.2%
413
 
7.2%
87
 
3.9%
54
 
2.2%
63
 
1.7%
92
 
1.1%
71
 
0.6%

LOC_INF
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct13
Distinct (%)8.2%
Missing0
Missing (%)0.0%
Memory size9.6 KiB
144 
VALE
 
2
BICU
 
2
LUMI
 
2
GUIA
 
1
Other values (8)
 
8

Length

Max length4
Median length0
Mean length0.3710691824
Min length0

Characters and Unicode

Total characters59
Distinct characters18
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique9 ?
Unique (%)5.7%

Sample

1st row
2nd row
3rd row
4th row
5th row

Common Values

ValueCountFrequency (%)
144
90.6%
VALE2
 
1.3%
BICU2
 
1.3%
LUMI2
 
1.3%
GUIA1
 
0.6%
MACA1
 
0.6%
MORR1
 
0.6%
PARQ1
 
0.6%
MOCA1
 
0.6%
AMAZ1
 
0.6%
Other values (3)3
 
1.9%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
lumi2
13.3%
bicu2
13.3%
vale2
13.3%
morr1
6.7%
moca1
6.7%
parq1
6.7%
maca1
6.7%
guia1
6.7%
alde1
6.7%
las1
6.7%
Other values (2)2
13.3%

Most occurring characters

ValueCountFrequency (%)
A12
20.3%
M6
10.2%
L6
10.2%
U5
8.5%
I5
8.5%
C4
 
6.8%
O3
 
5.1%
E3
 
5.1%
R3
 
5.1%
V2
 
3.4%
Other values (8)10
16.9%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter59
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
A12
20.3%
M6
10.2%
L6
10.2%
U5
8.5%
I5
8.5%
C4
 
6.8%
O3
 
5.1%
E3
 
5.1%
R3
 
5.1%
V2
 
3.4%
Other values (8)10
16.9%

Most occurring scripts

ValueCountFrequency (%)
Latin59
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
A12
20.3%
M6
10.2%
L6
10.2%
U5
8.5%
I5
8.5%
C4
 
6.8%
O3
 
5.1%
E3
 
5.1%
R3
 
5.1%
V2
 
3.4%
Other values (8)10
16.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII59
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
A12
20.3%
M6
10.2%
L6
10.2%
U5
8.5%
I5
8.5%
C4
 
6.8%
O3
 
5.1%
E3
 
5.1%
R3
 
5.1%
V2
 
3.4%
Other values (8)10
16.9%

DEXAME
Categorical

HIGH CARDINALITY
UNIFORM

Distinct119
Distinct (%)74.8%
Missing0
Missing (%)0.0%
Memory size10.5 KiB
2016-01-26
 
5
2016-02-02
 
4
2016-02-15
 
3
2016-03-15
 
3
2016-05-04
 
3
Other values (114)
141 

Length

Max length10
Median length10
Mean length9.962264151
Min length4

Characters and Unicode

Total characters1584
Distinct characters15
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique91 ?
Unique (%)57.2%

Sample

1st row2016-01-04
2nd row2016-01-12
3rd row2016-01-19
4th row2016-01-13
5th row2016-01-14

Common Values

ValueCountFrequency (%)
2016-01-265
 
3.1%
2016-02-024
 
2.5%
2016-02-153
 
1.9%
2016-03-153
 
1.9%
2016-05-043
 
1.9%
2016-02-013
 
1.9%
2016-06-093
 
1.9%
2016-02-043
 
1.9%
2016-02-033
 
1.9%
2016-12-212
 
1.3%
Other values (109)127
79.9%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
2016-01-265
 
3.1%
2016-02-024
 
2.5%
2016-02-043
 
1.9%
2016-03-153
 
1.9%
2016-02-013
 
1.9%
2016-02-153
 
1.9%
2016-05-043
 
1.9%
2016-06-093
 
1.9%
2016-02-033
 
1.9%
2016-04-042
 
1.3%
Other values (109)127
79.9%

Most occurring characters

ValueCountFrequency (%)
0358
22.6%
-316
19.9%
1288
18.2%
2258
16.3%
6195
12.3%
339
 
2.5%
436
 
2.3%
926
 
1.6%
526
 
1.6%
819
 
1.2%
Other values (5)23
 
1.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number1264
79.8%
Dash Punctuation316
 
19.9%
Lowercase Letter3
 
0.2%
Uppercase Letter1
 
0.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0358
28.3%
1288
22.8%
2258
20.4%
6195
15.4%
339
 
3.1%
436
 
2.8%
926
 
2.1%
526
 
2.1%
819
 
1.5%
719
 
1.5%
Lowercase Letter
ValueCountFrequency (%)
o1
33.3%
n1
33.3%
e1
33.3%
Dash Punctuation
ValueCountFrequency (%)
-316
100.0%
Uppercase Letter
ValueCountFrequency (%)
N1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common1580
99.7%
Latin4
 
0.3%

Most frequent character per script

Common
ValueCountFrequency (%)
0358
22.7%
-316
20.0%
1288
18.2%
2258
16.3%
6195
12.3%
339
 
2.5%
436
 
2.3%
926
 
1.6%
526
 
1.6%
819
 
1.2%
Latin
ValueCountFrequency (%)
N1
25.0%
o1
25.0%
n1
25.0%
e1
25.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII1584
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0358
22.6%
-316
19.9%
1288
18.2%
2258
16.3%
6195
12.3%
339
 
2.5%
436
 
2.3%
926
 
1.6%
526
 
1.6%
819
 
1.2%
Other values (5)23
 
1.5%

RESULT
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct7
Distinct (%)4.4%
Missing0
Missing (%)0.0%
Memory size10.4 KiB
1
113 
4
33 
2
 
9
10
 
1
8
 
1
Other values (2)
 
2

Length

Max length2
Median length1
Mean length1
Min length0

Characters and Unicode

Total characters159
Distinct characters6
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique4 ?
Unique (%)2.5%

Sample

1st row4
2nd row1
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
1113
71.1%
433
 
20.8%
29
 
5.7%
101
 
0.6%
81
 
0.6%
61
 
0.6%
1
 
0.6%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
1113
71.5%
433
 
20.9%
29
 
5.7%
101
 
0.6%
81
 
0.6%
61
 
0.6%

Most occurring characters

ValueCountFrequency (%)
1114
71.7%
433
 
20.8%
29
 
5.7%
81
 
0.6%
61
 
0.6%
01
 
0.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number159
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1114
71.7%
433
 
20.8%
29
 
5.7%
81
 
0.6%
61
 
0.6%
01
 
0.6%

Most occurring scripts

ValueCountFrequency (%)
Common159
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1114
71.7%
433
 
20.8%
29
 
5.7%
81
 
0.6%
61
 
0.6%
01
 
0.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII159
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1114
71.7%
433
 
20.8%
29
 
5.7%
81
 
0.6%
61
 
0.6%
01
 
0.6%

PMM
Real number (ℝ≥0)

HIGH CORRELATION
MISSING

Distinct37
Distinct (%)97.4%
Missing121
Missing (%)76.1%
Infinite0
Infinite (%)0.0%
Mean9504.131579
Minimum2
Maximum110520
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.4 KiB

Quantile statistics

Minimum2
5-th percentile59.2
Q1344
median1180
Q33910
95-th percentile82076
Maximum110520
Range110518
Interquartile range (IQR)3566

Descriptive statistics

Standard deviation25151.21433
Coefficient of variation (CV)2.646345341
Kurtosis9.909690787
Mean9504.131579
Median Absolute Deviation (MAD)1108
Skewness3.293929594
Sum361157
Variance632583582.1
MonotonicityNot monotonic
Histogram with fixed size bins (bins=37)
ValueCountFrequency (%)
5012
 
1.3%
1105201
 
0.6%
2001
 
0.6%
12001
 
0.6%
21
 
0.6%
5921
 
0.6%
10401
 
0.6%
26001
 
0.6%
18401
 
0.6%
11601
 
0.6%
Other values (27)27
 
17.0%
(Missing)121
76.1%
ValueCountFrequency (%)
21
0.6%
321
0.6%
641
0.6%
801
0.6%
1441
0.6%
2001
0.6%
2111
0.6%
3101
0.6%
3181
0.6%
3201
0.6%
ValueCountFrequency (%)
1105201
0.6%
863601
0.6%
813201
0.6%
124801
0.6%
121601
0.6%
81201
0.6%
72401
0.6%
60001
0.6%
44001
0.6%
40001
0.6%

PCRUZ
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct7
Distinct (%)4.4%
Missing0
Missing (%)0.0%
Memory size9.8 KiB
114 
4
20 
1
 
8
3
 
7
2
 
5
Other values (2)
 
5

Length

Max length1
Median length0
Mean length0.2830188679
Min length0

Characters and Unicode

Total characters45
Distinct characters6
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)0.6%

Sample

1st row4
2nd row
3rd row
4th row
5th row

Common Values

ValueCountFrequency (%)
114
71.7%
420
 
12.6%
18
 
5.0%
37
 
4.4%
25
 
3.1%
54
 
2.5%
61
 
0.6%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
420
44.4%
18
 
17.8%
37
 
15.6%
25
 
11.1%
54
 
8.9%
61
 
2.2%

Most occurring characters

ValueCountFrequency (%)
420
44.4%
18
 
17.8%
37
 
15.6%
25
 
11.1%
54
 
8.9%
61
 
2.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number45
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
420
44.4%
18
 
17.8%
37
 
15.6%
25
 
11.1%
54
 
8.9%
61
 
2.2%

Most occurring scripts

ValueCountFrequency (%)
Common45
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
420
44.4%
18
 
17.8%
37
 
15.6%
25
 
11.1%
54
 
8.9%
61
 
2.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII45
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
420
44.4%
18
 
17.8%
37
 
15.6%
25
 
11.1%
54
 
8.9%
61
 
2.2%

TRA_ESQUEM
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct7
Distinct (%)4.4%
Missing0
Missing (%)0.0%
Memory size9.9 KiB
115 
1
21 
99
18 
12
 
2
11
 
1
Other values (2)
 
2

Length

Max length2
Median length0
Mean length0.4088050314
Min length0

Characters and Unicode

Total characters65
Distinct characters4
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3 ?
Unique (%)1.9%

Sample

1st row1
2nd row
3rd row
4th row
5th row

Common Values

ValueCountFrequency (%)
115
72.3%
121
 
13.2%
9918
 
11.3%
122
 
1.3%
111
 
0.6%
41
 
0.6%
21
 
0.6%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
121
47.7%
9918
40.9%
122
 
4.5%
111
 
2.3%
41
 
2.3%
21
 
2.3%

Most occurring characters

ValueCountFrequency (%)
936
55.4%
125
38.5%
23
 
4.6%
41
 
1.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number65
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
936
55.4%
125
38.5%
23
 
4.6%
41
 
1.5%

Most occurring scripts

ValueCountFrequency (%)
Common65
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
936
55.4%
125
38.5%
23
 
4.6%
41
 
1.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII65
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
936
55.4%
125
38.5%
23
 
4.6%
41
 
1.5%

DSTRAESQUE
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct18
Distinct (%)11.3%
Missing0
Missing (%)0.0%
Memory size9.9 KiB
142 
CQ(4+4+4) E PQ30MG/21DIAS
 
1
CLOROQUINA10CP+PRIMAQ15CP
 
1
ARTESUNATO+CLINDAMICINA
 
1
ARTESUNATO+MEFLOQUINA
 
1
Other values (13)
 
13

Length

Max length30
Median length0
Mean length2.622641509
Min length0

Characters and Unicode

Total characters417
Distinct characters34
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique17 ?
Unique (%)10.7%

Sample

1st row
2nd row
3rd row
4th row
5th row

Common Values

ValueCountFrequency (%)
142
89.3%
CQ(4+4+4) E PQ30MG/21DIAS1
 
0.6%
CLOROQUINA10CP+PRIMAQ15CP1
 
0.6%
ARTESUNATO+CLINDAMICINA1
 
0.6%
ARTESUNATO+MEFLOQUINA1
 
0.6%
CERAQUINA+PRIMARQUINA1
 
0.6%
10 CLORIQ 28 PRIMAQ1
 
0.6%
CLOROQUINA+PRINAQUINA1
 
0.6%
CLOROQUINA3DIAS+PRIMAQUINA9DIA1
 
0.6%
CLOROQUINA 10CP+PRIMAQUINA15CP1
 
0.6%
Other values (8)8
 
5.0%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
cloroquina2
 
5.9%
dias2
 
5.9%
cloriq1
 
2.9%
primaq1
 
2.9%
cloroquina+prinaquina1
 
2.9%
161
 
2.9%
cloroquina+primoquina1
 
2.9%
arte+mef+prima1
 
2.9%
4+3+3prima1
 
2.9%
prinaqui1
 
2.9%
Other values (22)22
64.7%

Most occurring characters

ValueCountFrequency (%)
A45
 
10.8%
I40
 
9.6%
R35
 
8.4%
O30
 
7.2%
+22
 
5.3%
Q22
 
5.3%
N22
 
5.3%
M21
 
5.0%
P20
 
4.8%
C19
 
4.6%
Other values (24)141
33.8%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter338
81.1%
Decimal Number36
 
8.6%
Math Symbol22
 
5.3%
Space Separator17
 
4.1%
Other Punctuation2
 
0.5%
Open Punctuation1
 
0.2%
Close Punctuation1
 
0.2%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
A45
13.3%
I40
11.8%
R35
10.4%
O30
8.9%
Q22
 
6.5%
N22
 
6.5%
M21
 
6.2%
P20
 
5.9%
C19
 
5.6%
U18
 
5.3%
Other values (9)66
19.5%
Decimal Number
ValueCountFrequency (%)
18
22.2%
37
19.4%
06
16.7%
46
16.7%
24
11.1%
52
 
5.6%
61
 
2.8%
91
 
2.8%
81
 
2.8%
Other Punctuation
ValueCountFrequency (%)
,1
50.0%
/1
50.0%
Math Symbol
ValueCountFrequency (%)
+22
100.0%
Space Separator
ValueCountFrequency (%)
17
100.0%
Open Punctuation
ValueCountFrequency (%)
(1
100.0%
Close Punctuation
ValueCountFrequency (%)
)1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin338
81.1%
Common79
 
18.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
A45
13.3%
I40
11.8%
R35
10.4%
O30
8.9%
Q22
 
6.5%
N22
 
6.5%
M21
 
6.2%
P20
 
5.9%
C19
 
5.6%
U18
 
5.3%
Other values (9)66
19.5%
Common
ValueCountFrequency (%)
+22
27.8%
17
21.5%
18
 
10.1%
37
 
8.9%
06
 
7.6%
46
 
7.6%
24
 
5.1%
52
 
2.5%
,1
 
1.3%
61
 
1.3%
Other values (5)5
 
6.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII417
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
A45
 
10.8%
I40
 
9.6%
R35
 
8.4%
O30
 
7.2%
+22
 
5.3%
Q22
 
5.3%
N22
 
5.3%
M21
 
5.0%
P20
 
4.8%
C19
 
4.6%
Other values (24)141
33.8%

DTRATA
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct32
Distinct (%)20.1%
Missing0
Missing (%)0.0%
Memory size9.9 KiB
None
115 
2016-01-26
 
5
2016-02-04
 
4
2016-02-01
 
3
2016-01-22
 
2
Other values (27)
30 

Length

Max length10
Median length4
Mean length5.660377358
Min length4

Characters and Unicode

Total characters900
Distinct characters15
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique24 ?
Unique (%)15.1%

Sample

1st row2016-01-04
2nd rowNone
3rd rowNone
4th rowNone
5th rowNone

Common Values

ValueCountFrequency (%)
None115
72.3%
2016-01-265
 
3.1%
2016-02-044
 
2.5%
2016-02-013
 
1.9%
2016-01-222
 
1.3%
2016-03-032
 
1.3%
2016-02-022
 
1.3%
2016-11-232
 
1.3%
2016-04-081
 
0.6%
2016-07-071
 
0.6%
Other values (22)22
 
13.8%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
none115
72.3%
2016-01-265
 
3.1%
2016-02-044
 
2.5%
2016-02-013
 
1.9%
2016-02-022
 
1.3%
2016-01-222
 
1.3%
2016-03-032
 
1.3%
2016-11-232
 
1.3%
2016-03-161
 
0.6%
2016-01-181
 
0.6%
Other values (22)22
 
13.8%

Most occurring characters

ValueCountFrequency (%)
N115
12.8%
o115
12.8%
n115
12.8%
e115
12.8%
0106
11.8%
-88
9.8%
277
8.6%
177
8.6%
652
5.8%
412
 
1.3%
Other values (5)28
 
3.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number352
39.1%
Lowercase Letter345
38.3%
Uppercase Letter115
 
12.8%
Dash Punctuation88
 
9.8%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0106
30.1%
277
21.9%
177
21.9%
652
14.8%
412
 
3.4%
312
 
3.4%
86
 
1.7%
54
 
1.1%
74
 
1.1%
92
 
0.6%
Lowercase Letter
ValueCountFrequency (%)
o115
33.3%
n115
33.3%
e115
33.3%
Dash Punctuation
ValueCountFrequency (%)
-88
100.0%
Uppercase Letter
ValueCountFrequency (%)
N115
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin460
51.1%
Common440
48.9%

Most frequent character per script

Common
ValueCountFrequency (%)
0106
24.1%
-88
20.0%
277
17.5%
177
17.5%
652
11.8%
412
 
2.7%
312
 
2.7%
86
 
1.4%
54
 
0.9%
74
 
0.9%
Latin
ValueCountFrequency (%)
N115
25.0%
o115
25.0%
n115
25.0%
e115
25.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII900
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
N115
12.8%
o115
12.8%
n115
12.8%
e115
12.8%
0106
11.8%
-88
9.8%
277
8.6%
177
8.6%
652
5.8%
412
 
1.3%
Other values (5)28
 
3.1%

DT_ENCERRA
Unsupported

MISSING
REJECTED
UNSUPPORTED

Missing159
Missing (%)100.0%
Memory size1.4 KiB

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

TP_NOTID_AGRAVODT_NOTIFICSEM_NOTNU_ANOSG_UF_NOTID_MUNICIPID_REGIONAID_UNIDADEDT_SIN_PRISEM_PRIDT_NASCNU_IDADE_NCS_SEXOCS_GESTANTCS_RACACS_ESCOL_NSG_UFID_MN_RESIID_RG_RESIID_PAISDT_INVESTID_OCUPA_NCLASSI_FINAT_ATIVIDAAT_LAMINAAT_SINTOMATPAUTOCTOCOUFINFCOPAISINFCOMUNINFLOC_INFDEXAMERESULTPMMPCRUZTRA_ESQUEMDSTRAESQUEDTRATADT_ENCERRA
02B542016-01-0420160120163333045522883382015-12-282015521994-07-304021F5909333304551NaT99999114112AM11302602016-01-0448120.0412016-01-04NaT
12B542016-01-1220160220163333045522883382015-12-232015511971-02-044044F5209333304551NaT9999922101102016-01-121NaNNoneNaT
22B542016-01-1320160220163333034022981042016-01-072016011989-01-274026M6107333303401NaT2631102111102016-01-191NaNNoneNaT
32B542016-01-1320160220163333045522883382015-12-132015502006-12-074009F6202333304551NaT9999912111202016-01-131NaNNoneNaT
42B542016-01-1420160220163333024022765342016-01-132016021995-07-094020M6909333302401NaT2992202016-01-141NaNNoneNaT
52B542016-01-1520160220163333045522966322016-01-072016011979-07-094036F52333304551NaT1101121542016-01-152211.0222016-01-15NaT
62B542016-01-1820160320163333045522883382016-01-142016021997-07-294018F5909333304551NaT999991110112312016-01-18281320.0599PRIMA+ARTE+MEF+BROMOPRIDA2016-01-18NaT
72B542016-01-2120160320163333024022765342016-01-162016021986-11-054029M6209333302401NaT914405891102016-01-212NaN1NoneNaT
82B542016-01-2220160320163333045522883382016-01-162016021986-11-054029M6209333301701NaT1912222MOCA2016-01-222480.03992016-01-22NaT
92B542016-01-2220160320163333045522883382016-01-112016021959-06-014056M6909333304551NaT35460514112312016-01-2282360.0499ARTE+MEF+PRIMA2016-01-22NaT

Last rows

TP_NOTID_AGRAVODT_NOTIFICSEM_NOTNU_ANOSG_UF_NOTID_MUNICIPID_REGIONAID_UNIDADEDT_SIN_PRISEM_PRIDT_NASCNU_IDADE_NCS_SEXOCS_GESTANTCS_RACACS_ESCOL_NSG_UFID_MN_RESIID_RG_RESIID_PAISDT_INVESTID_OCUPA_NCLASSI_FINAT_ATIVIDAAT_LAMINAAT_SINTOMATPAUTOCTOCOUFINFCOPAISINFCOMUNINFLOC_INFDEXAMERESULTPMMPCRUZTRA_ESQUEMDSTRAESQUEDTRATADT_ENCERRA
1492B542016-12-1420165020163333045522883382016-12-132016501982-05-274034F5108333304551NaT2101102016-12-141NaNNoneNaT
1502B542016-12-1420165020163333045522883382016-12-052016491973-03-084043M6906333304551NaT2111102016-12-141NaNNoneNaT
1512B542016-12-1520165020163333045522883382016-12-012016481984-07-214032M6408333304551NaT2611252111102016-12-151NaNNoneNaT
1522B542016-12-1720165020163333045530034502016-12-162016501939-08-144077M6108333304551NaT2142052111102016-12-171NaNNoneNaT
1532B542016-12-1920165120163333045522883382016-12-192016512003-06-264013F5903333303501NaT9999912101102016-12-191NaNNoneNaT
1542B542016-12-2120165120163333034022980902016-10-282016431982-02-074034M6109333302451NaT2111102016-12-211NaNNoneNaT
1552B542016-12-2120165120163333045522883382016-12-152016501972-09-044044M6908333304551NaT2142052111102016-12-211NaNNoneNaT
1562B542016-12-2220165120163333034022718772016-12-202016511961-11-244055F6108333303401NaT7521052111102016-12-271NaNNoneNaT
1572B542016-12-2620165220163333045522883382016-12-222016511975-08-154041F5908333304551NaT2625052101102016-12-261NaNNoneNaT
1582B542016-12-2920165220163333045522883382016-12-272016521979-08-024037M6906333305551NaT2111102016-12-291NaNNoneNaT